Authors : Tracy Rabilloud*, Delphine Potier*, Saran Pankew, Mathis Nozais, Marie Loosveld§, Dominique Payet-Bornet§

* Equal contribution

§ Corresponding authors

PMID: to come

Raw data and intermediate data matrices are available in SRA/GEO (SRP269742 / GSE153697)

Docker images and intermediate seurat object are available in zenodo. Any questions on this analysis, please contact Delphine Potier

Data preprocessing

Demultiplexing results

Cells classification

Demultiplexing of cells based on HTO enrichment using HTOdemux()

Cells classification as singlets, doublets and negative/ambiguous cells

Detailed classification

Cells filtering

During the sample loading, we filter cells that do not pass the following filters :

I) Filter out low quality cells.

Parameters used in the Seurat CreateSeuratObject function:

  • min.genes: Include cells where at least 200 genes are detected

  • min.cells: Include genes with detected expression in at least 3 cells

After those filters, the remaining cell number is 7058.

II) Filter out doublet and negative cells after HTOdemux sample demultiplexing.

After selecting identified unique cells, 3882 cells remain.

III) Filter out cells having more than 10% mitochondrial associated genes expressed.

Only the 2919 cells below the red line are kept (mitochondrial percentage < 0.1).

Mitochondrial percentage versus nFeatures

Number of features and counts by sample (violin plot)

As expected, samples containing mostly tumoral cells (T1-CD19pos and T2-CD19neg) show a higher number of expressed genes.

UMAP colored by samples

Samples UMAP

Figure 1D : UMAP visualization of the 4 demultiplexed samples : T1-CD19neg, T1-CD19pos, T2-CD19neg & T2-CD19pos

Data analysis

Coarse-grained clustering

Clusters resolution 0.1

Clusters UMAP

Figure 1C : UMAP visualization of the 6 main clusters

Known markers expression

Figure 1B : Dotplot showing the expression level of marker genes in each clusters

  • B-ALL markers: CD34+; RPS14low

  • B-cell markers: CD19+; CD79A+; CD79B+

  • Hematogones maturation markers Immature : CD10 (MME) up Mature : CD20 (MS4A1) up

  • NK-cell markers: NKG7; GNLY; KLRD1; GZMB; KLRC1

  • Myeloid cells markers LYZ; CD68; LGALS3; CD14; CD33

N.B. some of those markers are only for monocytes/macrophages, therefore the upper part of cluster 2 mostly composed of myeloid progenitors (CMP / MEP / GMP / pro-myelocytes) does not show high expression levels

UMAP visualization of markers expression

Interactive umap : clusters vs samples

DEGs analysis

Differential gene expression analysis:

We search for markers specific to each clusters

  • Table
  • UMAPs

The following plots show the expression of each cluster’s top marker based on avg_logFC.

## [1] "PMAIP1 Promotes activation of caspases and apoptosis. Promotes mitochondrial membrane changes and efflux of apoptogenic proteins from the mitochondria. Contributes to p53/TP53-dependent apoptosis after radiation exposure."

## [1] "AC245014.3 is a lincRNA, MUC20 overlapping transcript"

## [1] "PRDX1 : This gene encodes a member of the peroxiredoxin family of antioxidant enzymes, which reduce hydrogen peroxide and alkyl hydroperoxides. The encoded protein may play an antioxidant protective role in cells, and may contribute to the antiviral activity of CD8(+) T-cells. This protein may have a proliferative effect and play a role in cancer development or progression."

## [1] "STMN1 : part of the cell cycle gene list;  \nThis gene belongs to the stathmin family of genes. It encodes a ubiquitous cytosolic phosphoprotein proposed to function as an intracellular relay integrating regulatory signals of the cellular environment. The encoded protein is involved in the regulation of the microtubule filament system by destabilizing microtubules. It prevents assembly and promotes disassembly of microtubules."

## [1] "S100A8 : Monocytes marker"

## [1] "S100A9 : Monocytes marker"

## [1] "VPREB1 : this gene belongs to the immunoglobulin superfamily and is expressed selectively at the early stages of B cell development, namely, in proB and early preB cells. This gene encodes the iota polypeptide chain that is associated with the Ig-mu chain to form a molecular complex which is expressed on the surface of pre-B cells. The complex is thought to regulate Ig gene rearrangements in the early steps of B-cell differentiation. Alternative splicing results in multiple transcript variants."

## [1] "DNTT : This gene is a member of the DNA polymerase type-X family and encodes a template-independent DNA polymerase that catalyzes the addition of deoxynucleotides to the 3'-hydroxyl terminus of oligonucleotide primers. In vivo, the encoded protein is expressed in a restricted population of normal and malignant pre-B and pre-T lymphocytes during early differentiation, where it generates antigen receptor diversity by synthesizing non-germ line elements (N-regions) at the junctions of rearranged Ig heavy chain and T cell receptor gene segments."

## [1] "IGKC"

## [1] "IGLL5 :  This gene encodes one of the immunoglobulin lambda-like polypeptides. It is located within the immunoglobulin lambda locus but it does not require somatic rearrangement for expression. The first exon of this gene is unrelated to immunoglobulin variable genes; the second and third exons are the immunoglobulin lambda joining 1 and the immunoglobulin lambda constant 1 gene segments."

## [1] "GNLY :  The product of this gene is a member of the saposin-like protein (SAPLIP) family and is located in the cytotoxic granules of T cells, which are released upon antigen stimulation. This protein is present in cytotoxic granules of cytotoxic T lymphocytes and natural killer cells, and it has antimicrobial activity against M. tuberculosis and other organisms."

## [1] "NKG7 : natural killer cell group 7 sequence"

Cell cycle

S score

G2M score

Phases

Clustering vs Sample ID

Cross-tabulated table :

Number of cells by samples in each cluster

% of cells by samples in each cluster

Clones backtracking

Figure 2 related

All clones

Clones selected for backtracking

T1 CD19neg clones in T1 tumoral area

T1 CD19neg clones in T2 tumoral area

T2 CD19neg clones in T1 tumoral area

Positive control

Negative control

Summary table

Table summarizing infos for interesting cells

Extra DEG analysis

B-ALL cells (Cluster 0 & 1) : T1 CD19neg cells versus T1 CD19pos cells

We selected B-ALL cells (cluster 0 & 1) to compare transcriptome from T1 CD19 negative and CD19 positive cells :

We did not dectect “significant” differences between cluster 0 T1 CD19 negative and CD19 positive cells except for SPAST (this analysis is to take with caution, indeed there are only 20 T1 CD19 negative cells)

For comparison, we did an additionnal differential expression analysis.

1 - Cluster 0 T1 CD19 negative cells to Cluster 1 T2 CD19 negative cells :

We found more differences than in the previous comparison (cluster 0 : T1 CD19 negative versus CD19 positive cells) similar to the differences found when comparing T1 CD19pos and T2 CD19neg from cluster 0 and 1 (see below, e.g. TSC22D3, KLF6, PMAIP1…)

B-ALL cells (Cluster 0 & 1) versus other cell types

Related to Figure 2B and Supplementary Figure 5

1 - Comparison of B-ALL cells (excluding the 20 B-ALL T1 CD19neg cells) to other cell types

2 - Visualization of the expression profile of differentially expressed genes across clusters of interest (higlighting the 20 B-ALL T1 CD19neg)

B-ALL versus Physiological B-cells

  • B-ALL (clusters 0 & 1) versus Physiological immature B-cells (hematogones, cluster3)

  • B-ALL (clusters 0 & 1) versus Physiological mature B-cells (cluster4)

Common top genes differentially expressed in:

1 - tumoral cells (clusters 0 & 1, excluding the 20 T1 CD19neg cells) and immature B-cells (cluster 3)

and

2 - tumoral cells (clusters 0 & 1, excluding the 20 T1 CD19neg cells) and mature B-cells (cluster 4)

##  [1] "CD34"       "CLTC"       "HSPB1"      "AC245014.3" "SLC2A3"    
##  [6] "CLEC2B"     "PDLIM1"     "YBX1"       "CD164"      "PMAIP1"    
## [11] "TCL1B"      "WDR74"      "CD44"       "Z93241.1"   "MSH6"      
## [16] "GNA15"      "AC103591.3" "HSP90AB1"   "RHOB"       "AC007952.4"
## [21] "PRDX1"      "TSC22D3"    "AKAP13"     "AC091271.1" "PTCH2"     
## [26] "ZFP36L2"    "RASD1"      "DDIT4"
##  [1] "RUBCNL"   "NCF1"     "RPL7A"    "CD79B"    "RPS6"     "RPL12"   
##  [7] "CD9"      "RPL35"    "IGHV5-78" "POU2AF1"

Heatmap showing the expression level of genes from the above list.

  • Clusters 0, 1 & 3 were down-sampled to 100 cells for a better readability.

  • Full heatmap

B-ALL versus Myeloid cells

Heatmap showing the expression level of the top 100 differentially expressed genes (50 up- and 50 down-regulated genes) between tumoral cells (clusters 0 & 1, excluding the 20 T1 CD19neg cells) and myeloid cells (cluster 2).

  • Clusters 0, 1 & 2 were down-sampled to 100 cells for a better readability.

  • Full heatmap

B-ALL versus NK cells

Heatmap showing the expression level of the top 100 differentially expressed genes (50 up- and 50 down-regulated genes) between tumoral cells (clusters 0 & 1, excluding the 20 T1 CD19neg cells) and NK cells (cluster 5).

  • Clusters 0 & 1 were down-sampled to 100 cells for a better readability.

  • Full heatmap

  • Dotplot showing markers expression across clusters, except that the 20 T1 CD19neg cells present in B-ALL clusters are shown independently. (Figure 2C)

Common tumoral markers

Comparison of B-ALL cells (cluster 0 and 1) to physiological cells (clusters 2, 3, 4 & 5)

Tumoral progression

We selected clusters of interest (0 and 1) to search for tumoral progression markers between T1 CD19pos and T2 CD19neg: KLF6 is the most differentially expressed transcription factor.

  • CD19 is differentially expressed between T1 B-ALL and T2 B-ALL : Figure S3B

SCENIC

Gene regulatory network inference and individual cells network/regulon activity scoring

We check the transcription factor the most differentially expressed between B-ALL at T1 and T2 to see if their regulon activity was impacted

KLF6

KLF6 mRNA expression level

KLF6 regulon activity (calculated with SCENIC)

Browse by figures

Main figures

Figure 1B

Dotplot showing the expression level of marker genes in each cluster

Figure 1C

UMAP visualisation of clusters (resolution 0.1)

Figure 1D

UMAP visualization of the 4 demultiplexed samples

Figure 2A

Zoom on clusters 0 and 1, UMAP plot focused on tumoral cells colored according to the sample of origin. T1 CD19neg cells for which CD19 transcript presence was tested by PCR are labelled.

  • All cells for which CD19 transcript presence was tested by PCR are labelled.

Figure 2B

Common top genes differentially expressed in :

1 - tumoral cells (clusters 0 & 1, excluding the 20 T1 CD19neg cells) and hematogones (cluster 3)

and

2 - tumoral cells (clusters 0 & 1, excluding the 20 T1 CD19neg cells) and mature B-cells (cluster 4)

Figure 2C

Dotplot showing markers expression across clusters, except that the 20 T1 CD19neg cells present in B-ALL clusters are shown independently.

Supplementary figures

Figure S3

UMAP visualization of KLF6 mRNA expression and regulon activity

Figure S4

Figure S4A (interactive)

UMAP visualization of CD19 mRNA

Figure S4B

Violin plot of CD19 mRNA in T1-CD19pos (blue) and T2-CD19neg (gold) cells present in cluster 0 or 1

Percentage of cells expressing CD19 :

  • T1-CD19pos : 73.0668983 % of cells expressing CD19
  • T2-CD19neg : 49.6473907 % of cells expressing CD19

CD19 is differentially expressed in tumoral T1-CD19pos cells compared to tumoral T2-CD19neg cells ( Wilcoxon Rank Sum test : average log FC 0.55 ; adjusted pvalue 7.1e-51, see “Extra DEG analysis” section)

Figure S5

Figure S5A : B-ALL versus Myeloid cells

Heatmap showing the expression level of the top 100 differentially expressed genes (50 up- and 50 down-regulated genes) between tumoral cells (clusters 0 & 1, excluding the 20 T1 CD19neg cells) and myeloid cells (cluster 2)

  • Clusters 0, 1 & 2 were down-sampled to 100 cells for a better readability.

B-ALL versus NK cells

Heatmap showing the expression level of the top 100 differentially expressed genes (50 up- and 50 down-regulated genes) between tumoral cells (clusters 0 & 1, excluding the 20 T1 CD19neg cells) and NK cells (cluster 5).

  • Clusters 0 & 1 were down-sampled to 100 cells for a better readability.

Figure S6

Figure S7

Zoom on clusters 0 and 1, UMAP plot focused on tumoral cells colored according to the sample of origin. T2 CD19neg cells are highlighted and cells for which CD19 transcript presence was tested by PCR are labelled.

Session info

## R version 3.5.3 (2019-03-11)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.5 LTS
## 
## Matrix products: default
## BLAS/LAPACK: /usr/lib/libopenblasp-r0.2.18.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.23         RColorBrewer_1.1-2 magrittr_1.5      
## [4] dplyr_0.8.1        plotly_4.9.0       ggplot2_3.1.1     
## [7] DT_0.6.2           Seurat_3.0.1      
## 
## loaded via a namespace (and not attached):
##  [1] nlme_3.1-140        tsne_0.1-3          bitops_1.0-6       
##  [4] httr_1.4.0          sctransform_0.2.0   tools_3.5.3        
##  [7] R6_2.4.0            irlba_2.3.3         KernSmooth_2.23-15 
## [10] lazyeval_0.2.2      colorspace_1.4-1    npsurv_0.4-0       
## [13] withr_2.1.2         tidyselect_0.2.5    gridExtra_2.3      
## [16] compiler_3.5.3      labeling_0.3        caTools_1.17.1.2   
## [19] scales_1.0.0        lmtest_0.9-37       ggridges_0.5.1     
## [22] pbapply_1.4-0       stringr_1.4.0       digest_0.6.19      
## [25] rmarkdown_1.12      R.utils_2.8.0       pkgconfig_2.0.2    
## [28] htmltools_0.3.6     bibtex_0.4.2        htmlwidgets_1.3    
## [31] rlang_0.3.4         shiny_1.3.2         zoo_1.8-5          
## [34] jsonlite_1.6        crosstalk_1.0.0     ica_1.0-2          
## [37] gtools_3.8.1        R.oo_1.22.0         Matrix_1.2-17      
## [40] Rcpp_1.0.1          munsell_0.5.0       ape_5.3            
## [43] reticulate_1.12     R.methodsS3_1.7.1   stringi_1.4.3      
## [46] yaml_2.2.0          gbRd_0.4-11         MASS_7.3-51.4      
## [49] gplots_3.0.1.1      Rtsne_0.15          plyr_1.8.4         
## [52] grid_3.5.3          promises_1.0.1      parallel_3.5.3     
## [55] gdata_2.18.0        listenv_0.7.0       ggrepel_0.8.1      
## [58] crayon_1.3.4        lattice_0.20-38     cowplot_0.9.4      
## [61] splines_3.5.3       SDMTools_1.1-221.1  pillar_1.4.0       
## [64] igraph_1.2.4.1      future.apply_1.2.0  reshape2_1.4.3     
## [67] codetools_0.2-16    glue_1.3.1          evaluate_0.13      
## [70] lsei_1.2-0          metap_1.1           data.table_1.12.2  
## [73] httpuv_1.5.1        png_0.1-7           Rdpack_0.11-0      
## [76] gtable_0.3.0        RANN_2.6.1          purrr_0.3.2        
## [79] tidyr_0.8.3         future_1.13.0       assertthat_0.2.1   
## [82] xfun_0.7            mime_0.6            rsvd_1.0.0         
## [85] xtable_1.8-4        later_0.8.0         survival_2.44-1.1  
## [88] viridisLite_0.3.0   tibble_2.1.1        cluster_2.0.9      
## [91] globals_0.12.4      fitdistrplus_1.0-14 ROCR_1.0-7